translated by 谷歌翻译
Today's robots often interface with data-driven perception and planning models with classical model-predictive controllers (MPC). Often, such learned perception/planning models produce erroneous waypoint predictions on out-of-distribution (OoD) or even adversarial visual inputs, which increase control costs. However, today's methods to train robust perception models are largely task-agnostic - they augment a dataset using random image transformations or adversarial examples targeted at the vision model in isolation. As such, they often introduce pixel perturbations that are ultimately benign for control. In contrast to prior work that synthesizes adversarial examples for single-step vision tasks, our key contribution is to synthesize adversarial scenarios tailored to multi-step, model-based control. To do so, we use differentiable MPC methods to calculate the sensitivity of a model-based controller to errors in state estimation. We show that re-training vision models on these adversarial datasets improves control performance on OoD test scenarios by up to 36.2% compared to standard task-agnostic data augmentation. We demonstrate our method on examples of robotic navigation, manipulation in RoboSuite, and control of an autonomous air vehicle.
translated by 谷歌翻译
translated by 谷歌翻译
Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2.54%-5.88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model's inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.
translated by 谷歌翻译
自动车辆(AVS)必须与异构地理区域的多种人类驱动因素互动。理想情况下,AVS的车队应该共享轨迹数据,以持续地从使用基于云的分布式学习的集体经验来重新列车和改进轨迹预测模型。与此同时,这些机器人应该理想地避免上传原始驱动程序交互数据,以保护专有政策(在与其他公司共享时的见解)或保护驾驶员隐私。联合学习(FL)是一种流行的机制,用于在不泄露私人本地数据的情况下从不同的用户学习来自不同用户的云服务器模型。然而,FL通常不是强大的 - 当用户数据来自高度异构的分布时,它会学习次优模型,这是人机交互的关键标志。在本文中,我们提出了一种小型变种的个性化FL,专门从事强大的机器人学习模型到不同的用户分布。我们的算法在实际用户研究中优于2倍的标准FL基准,我们进行了我们进行的人力操作车辆必须优雅地合并标准Carla和Carlo AV模拟器中的模拟AVS。
translated by 谷歌翻译
translated by 谷歌翻译
可以采用局部差异隐私(LDP)来匿名化更丰富的用户数据属性,这些属性将输入复杂的机器学习(ML)任务。但是,当今的最不发达国家方法在很大程度上是任务敏捷的,并且经常导致严重的性能丧失 - 无论哪些功能与最终任务最相关,它们都会根据给定的隐私预算向所有数据属性注入噪声。在本文中,我们通过考虑任务感知的隐私保护问题来解决如何通过多维用户数据来显着改善最终任务性能。关键想法是使用编码器框架框架来学习(和匿名)用户数据的与任务相关的潜在表示。我们为线性设置获得了一个分析近最佳解决方案,并具有均方误差(MSE)任务损失。我们还通过基于梯度的学习算法为一般非线性病例提供了近似解决方案。广泛的实验表明,与标准基准的LDP方法相比,我们的任务感知方法可显着提高最终任务准确性,并具有相同的隐私保证。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译